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Abstract 

Background: Despite our increasing recognition of the mechanisms that specify and propagate epigenetic 
states of gene expression, the pattern of how epigenetic modifications contribute to the overall genetic variation 
of a phenotypic trait remains largely elusive. 

Results: We construct a quantitative model to explore the effect of epigenetic modifications that occur at specific 
rates on the genome. This model, derived from, but beyond, the traditional quantitative genetic theory that is 
founded on Mendel's laws, allows questions concerning the prevalence and importance of epigenetic variation 
to be incorporated and addressed. 

Conclusions: It provides a new avenue for bringing chromatin inheritance into the realm of complex traits, 
facilitating our understanding of the means by which phenotypic variation is generated. 



Background 

Systematic or stochastic changes in chromatin states, 
such as DNA methylation, chromatin remodeling, his- 
tone modification and RNA interference, have been 
thought to provide an additional driving force for 
phenotypic variation in complex traits and diseases [1-9]. 
Different chromatin states, called epialleles, that occur in 
the same sequence allele cannot be captured by an ana- 
lysis based on DNA sequence alone [10]. With the 
increasing availability of epigenome technologies, there 
has been an unprecedented opportunity to understand 
the role of epiallelic variants in maintaining and inducing 
functional variation for organisms to better buffer against 
environmental perturbations. This hence entails the de- 
velopment of quantitative models that can enable our 
knowledge about the amount and pattern of quantitative 
variation determined by epialleles. By integrating with 
linkage or association mapping strategies, these models 
can retrieve epigenetic variation that cannot be estimated 
presently [10-13]. 

There have been several publications on methodological 
development for epigenetic detection [14-17]. Johannes 
and Colome-Tatche [16] proposed an experimental 
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approach for estimating epigenetic variation in experi- 
mental crosses derived from epigenomically perturbed 
isogenic lines. This approach is powered to model the 
effects of epiallelic instability, recombination, parent- 
of-origin effects, and transgressive segregation on pheno- 
typic variation across generations. Tal et al. [15] derived 
an expression form for covariances between relatives due 
to epigenetic transmissibility. A statistical model based on 
multiple testing procedures has been developed to iden- 
tify the genomic regions of epigenetic variability among 
different individuals from genome -wide DNA methylation 
data [18]. These model developments, in a combination 
with empirical studies, can be used to test the hypothesis 
that epigenetic variation arising from chromatin modifi- 
cations of DNA directly or indirectly is an important 
contributor to the missing heritability [17,19]. 

Despite these advances, we are still unclear how much 
of the phenotypic variation is contributed by epigenetic 
modifications and, more importantly, through which 
way epialleles trigger their effects on phenotypic values. 
The motivation of this article is to develop a quantitative 
model for estimating and testing the contribution of 
epigenetic variants to quantitative trait variation. The 
model allows the prediction of how much genetic vari- 
ation is produced through a change in the rate of occur- 
rence of epigenetic mutation and the effect of epigenetic 
factors in a natural population. We particularly discuss 
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how the epigenetic effect interacts with other genetic 
effects, such as additive and dominant, to affect pheno- 
typic traits. By implementing it into genome-wide asso- 
ciation studies [19], the model proposed provides useful 
guidance for designing efficient and effective molecular 
experiments to characterize a comprehensive picture of 
the epigenetic variation of complex traits or diseases in 
different organisms. 

Model 

Occurrence rate of methylation 

Consider an epigenetic study population of n individuals 
that are randomly drawn from a natural population, in 
which a nucleotide site, with two alleles A x and A 2 , is 
thought to affect a phenotypic trait. Let p and q (p + q = 1) 
denote the allele frequencies of A 1 and A 2 in the nat- 
ural population at Hardy- Weinberg equilibrium (HWE), 
respectively. The genotypic frequencies of A ± A 1} A 1 A 2 , 
and A 2 A 2 at the nucleotide site studied are expressed as 
p 2 , 2pq, and q 2 , respectively [20,21]. 

At the nucleotide site studied, some cytosines within a 
CpG dinucleotide are methylated by adding a methyl 
group to the 5 position of the cytosine pyrimidine ring. 
With no loss of generality, allele A 1 is a cytosine which 
is, if any, methylated into a new "allele" called the epial- 
lele, denoted as A e) at a rate u. After DNA methylation, 
the population frequencies of non-methylated A 1 allele, 
epiallele A e and allele A 2 are (1 - u)p, up, and q, respect- 
ively. Current technologies allow the distinction of 
epialleles from non- methylated alleles. The process of 
methylation and the resulting frequencies of six distin- 
guishable genetic and epigenetic types are expressed as 



Genotype/epigenotype Frequency Observation 

, A\Ay No methylation , (1 — u) 2 p 2 + D u + Di e ( n\\ 

AxAx^l A\A e One methylation < 2u{\ - u)p 2 - 2D Xe 

k A e A e Two methylations ^ u 2 p 2 + D\ e + D 2e 

iA 2 No methylation f 2(1 - u)pq - 2D U 

\ 2 A e One methylation 1 2upq — 2D 2e [ n 2e 

A 2 A 2 P A 2 A 2 No methylation q 2 + D l2 + D 2e 



A\A 2 



\A 2 / 



p 

{ n v . 
\ n% 

n 22 



(i) 



where D 12 , D le , and D 2e are the coefficients of Hardy- 
Weinberg disequilibrium (HWD) due to a non-random 
association between alleles A 1 and A 2 , between allele A x 
and epiallele A ei and between allele A 2 and epiallele A & re- 
spectively. It is possible that the previous equilibrium of 
the population is violated by DNA methylation, leading to 
the HWD quantified by D 12) D le) and D 2e . Thus, the geno- 
type and epigenotype frequencies may be determined by 
allele and epiallele frequencies and HWD coefficients. 

Let n ll9 n le , n ee , n 12 , n 2e , and n 22 (n 11 +n le +n ee +n 12 + 
K2e+ n 22 = n) denote the observations of the cor- 
responding genotypes/epigenotypes (1) in the study 
population. Based on the frequencies of these genotypes/ 



epigenotypes, we formulate a polynomial likelihood from 
which to obtain the maximum likelihood estimates 
(MLEs) of the allele frequencies, the occurrence fre- 
quency of methylation, and HWD using 



P 



nn + nu + n ee + \{nn + n 2e ) 
n 

n ee + \{n\e + n 2e ) 
nn + nu + n ee + \{m 2 + n 2e ) 

n 22 + \(ni 2 + n 2e ) 



bi e = — ll)p 2 



D 2e = upq 



n^ 
2n 



D 12 = (1 - u)pq 



2n 



nn 
2n 



(2) 
(3) 
(4) 
(5) 
(6) 
(7) 



We are interested in investigating whether there is sig- 
nificant occurrence of DNA methylation at the nucleo- 
tide site. This can be tested by formulating a null 
hypothesis, H 0 : u = 0, vs. an alternative hypothesis, 
H 0 : u * 0, under each of which the likelihoods (L 0 and L 2 ) 
are calculated, respectively. However, because the u value 
in the H 0 lies on the boundary of parameter space, the 
log-likelihood ratio calculated, 

LR = -2(log L 0 -log Li), 

may not follow a standard chi-square distribution. Self and 
Liang [22] showed that the null distribution of the LR test 
statistic is a mixture of projections of chi-square variables 
onto surfaces, with the ^eights of mixtures that can be 
derived analytically only in special cases. By establishing 
the asymptotic null and alternative distributions of quasi- 
likelihood ratio, rescaled quasi-likelihood ratio, Wald, and 
score tests, Andrews [23] suggested the use of these test 
statistics to test the boundary value of a model parameter. 
While the first three test statistics are easy to com- 
pute, the score test is more difficult by deriving the 
first and second-order derivatives of the alternative 
log-likelihood. 

Similar tests can be performed for individual HWD, 
Di e) D 2e , or £) 12 , or their combinations, by formulating 
the null hypotheses, respectively. Under the alternative 
hypothesis H x associated with each null hypothesis 
considered, the likelihood is calculated. The LR value 
calculated is thought to be asymptotically chi-square 
distributed with the degree of freedom equal to the dif- 
ference in the number of parameters to be estimated 
between the alternative and null hypotheses. 
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Genetic and epigenetic effect 

We assume that the study population is investigated 
under a uniform condition so that the phenotypic vari- 
ation can be simply partitioned into genetic/epigenetic 
components and errors. There are only three genotypes, 
A ± A 1} A 1 A 2 , and A 2 A 2 , prior to DNA methylation. Let a 
denote the additive effect of the nucleotide site due 
to the substitution of allele A x by A 2 or vice versa and 
d denote the dominant effect due to the interaction 
between the two alleles. The values of three genotypes 
are diagrammed over an axis as follows: 



Genotype A 2 A 2 
Genotypic value fi-a 
Net genotypic value -a 



AiA 2 AiAi 
ft ft + d ft + a 
0 d a 



(8) 



Origin 

As described above, allele A x is assumed to be methy- 
lated into the epiallele A e . The values of six distinguish- 
able genetic and epigenetic types are expressed as 



Genotype / epigenotype 

{A \A\ No methylation 
AiA e One methylation 
A e A e Two methylations 
No methylation 
One methylation 
A 2 A 2 P A 2 A 2 No methylation 



A X A 2 



J Al A 2 
\A 2 A e 



Expected Value 

{jA + d\ 
jA + \(cii + a e ) + a 

{pi — \a e + d\ 2 
pi — \a\+ d 2e 



Estimated Value 

i=i yi/nie 

E«12 , 
i=i yilni2 

E«22 , 
i=l yi/»22 

(9) 

where the genotypic value of the trait is decomposed 
into different components, i.e., the overall mean the 
additive effects due to the substitution of allele Ai (ai) 
and epiallele A e by allele A 2 (a e ), and the dominance 
effects due to the interaction between allele A x and 
epiallele A e (di e ), between allele A Y and allele A 2 (di 2 ) 
and between allele A 2 and epiallele A e {d 2e ). 

Let y t denote the phenotypic value of the trait for indi- 
vidual i (i =1, . . ., n) in the study population. The MLEs 
of the genotypic value for each genotype/epigenotype 
can be obtained by simply taking its mean over all 
individuals belonging to this genotype/ epigenotye (9). 
The genetic and epigenetic effects can be estimated by 
solving a group of regular equations for the genotypic 
values (9), i.e., 



ci\ 



■ \~^ W H /\~^ W ee \~^ W 22 

2 2^i=^ ( 2^ , 2^=1* 



n 2 2 



■ V~^^ee /V^^H V~^ W 22 \ 



n ee y nu n 22 

Enie /\~^n n \~^n ee \ 



«le 



"11 



(10) 

(11) 
(12) 



En-ie /V^ w 22 \-^n ee \ 

(13) 

n 2e 2 y n 22 n ee J 

E«12 /V~^ W H V~^ W 22 \ 

"12 2 y nu n 22 J 

Each of these effects (10) - (14) can be tested by the 
log-likelihood ratio approach. For an epigenetic study, 
we are more interested in testing the epigenetic effect of 
the nucleotide site a e and dominant effects due to the 
interactions between the alleles and epiallele d Xe and d 2e . 
The log-likelihood ratio test statistics for each hypothesis 
test is thought of being asymptotically chi-square 
distributed with the degree of freedom equal to the dif- 
ference in the number of parameters to be estimated 
between the alternative and null hypotheses. 

Genetic and epigenetic variation 

We first give the genetic variance explained by the nucleo- 
tide site studied prior to DNA methylation. By defining a 
new parameter called the average effect a = a + (q-p)d 
[20], we derived the overall genetic variance of the trait due 
to this site as 



(9) 

2pqa 2 + (2pqd) 2 =a 2 a + a 2 



(15) 



where (? a = 2pqa 1 is the additive genetic variance depending 
on both a and d, and = (2pqd) 2 is the dominant genetic 
variance only depending on d. Both additive and domin- 
ance variances are affected by the relative magnitudes of al- 
lele frequencies p and q. These two variances reach their 
maximums when two alternative alleles A x and A 2 occur at 
the same frequency. 

In what follows, we model how the epigenetic change 
contributes to the genetic variance of a complex trait 
based on the frequencies (1) and values of genotypes/ 
epigenotypes (9). The total genetic variation among the 
six genotypes/epigenotypes is derived as 

o% = a\ [(1 - ufp 2 + Di 2 + Di e ] + a 2 e [u 2 p 2 + D Xe + D 2e ] 
+ (ai + a e ) 2 [q 2 + D u + D 2e ] + \^(cii + a e ) + 
x [2u(l - u)p 2 - 2Di e ] + [-^i + d 2e ] 2 

x [2upq - 2D 2e ] + \^fe + ^12] 

x [2(1 - u)pq - 2D 12 ] - m 2 (16) 

where m is the population mean expressed as 

m = ai [(1 — u)p — q] + a e (up — q) 

+ 2di e — u)p 2 — Die] + 2d 2e (upq — D 2e ) 
+ 2di 2 [(l-u)pq-D u } 

It can be seen from equation (16) that the total genetic 
variance includes 15 different parts, i.e., 
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= g Additive effect of the original alleles prior to methylation 

+ G 2 ae Additive effect of the epiallele 

+ G 2 die Domiant effect between the original allele and epiallele 

+ G 2 d2e Domiant effect between the original allele and epiallele 

+ G du Domiant effect between the original alleles 

+ G 2 aiXae Multiplicative additive x additive effect involving the epiallele 

+ o 1 d Multiplicative additive x dominant effect involving the epiallele 

+ ^a 1 xd 2 e Multiplicative additive x dominant effect involving the epiallele 

+ G 2 iXdn Multiplicative additive x dominant effect with no epiallele 

+ a l e xdi e Multiplicative additive x dominant effect involving the epiallele 

+ ^a e xd 2 e Multiplicative additive x dominant effect involving the epiallele 

+ a a e xd 12 Multiplicative additive x dominant effect involving epiallele 

+ a diexd 2e Multiplicative dominant x additive effect involving the epiallele 

+ a d le xd 12 Multiplicative additive x dominant effect involving the epiallele 

-\- G 2 d xd Multiplicative additive x dominant effect involving the epiallele 



Here, we define a new heritability, called the epi- 
genetic heritability, which describes the proportion of 
the phenotypic variance explained by the effect of the 
epiallele and its interactions with the other effects, 
expressed as 



Hi 



a\ xdi: 



(17) 



Also, we use the proportion of the epigenetic variance 
to the total genetic variance to describe the relative con- 
tribution of epigenetic methylation to the overall genetic 
variance, expressed as 



Rt 



G 



a\ xd\ 2 



(18) 



These two parameters can be used to assess the con- 
tribution of DNA methylation to the total phenotypic 
variation of a quantitative trait. 

Numerical analysis 

In this section, we performed numerical analyses to in- 
vestigate how epigenetic marks contribute to the herit- 
ability of a complex trait. The occurrence of epigenetic 
marks is described by population genetic parameters in- 
cluding the occurrence rate of the epiallele and its 
Hardy- Weinberg disequilibria with unmarked alleles. 
The effect of epigenetic marks can be specified by quan- 
titative genetic parameters including the epigenetic effect 
of the epiallele and its interactions with other effects. 
As analyzed above, population genetic parameters (p, q, u, 
D\ e) D 2e > D 12 ) and quantitative genetic parameters (a l7 a e , 



d\ e , d 2e , d 12 ) contribute to the genetic variance in a com- 
plex way (16). We will analyze the contribution of 
epigenetic marks by separately investigating how these 
population and quantitative genetic parameters affect R 2 . 

Population genetic effect 

Suppose there is a study population in which methylated 
sites are observed for a phenotypic trait. Consider a nu- 
cleotide site with two alleles A Y and A 2 , one of which, 
say A lf is methylated at a rate u (u takes any value in 
[0,1]). This methylation may violate the previous HWE 
assumption. Based on a simple algebraic analysis, we 
obtain the intervals of D le , D 2e and D 12 as follows: 

-- [{l-ufp 2 + u 2 q 2 + (£>i2 + D 2e )]<D le <(l-u)p 2 
-- [u 2 p 2 + q 2 + (D le + D u )] <D 2e <upq 
- l -[(l-ufp 2 + q 2 + (D u +D 2e )]<D 12 <(l-u)pq 

Because of DNA methylation, the change of the gen- 
etic variance explained by the site takes place. By fixing 
quantitative genetic parameters, we quantitatively exam- 
ined the impacts of different occurrence rates of methy- 
lation and different HWD coefficients on the epigenetic 
ariance. A small value of occurrence rate may lead to 
the formation of substantial epigenetic variance, al- 
though this phenomenon depends on the disequilib- 
rium degree of association between two original alleles 
produced following methylation (Figure 1). The epigenetic 
variance is also positively associated with the degree of 
disequilibrium for the unmarked alleles and epiallele 
(Figure 2). 
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Figure 1 Change of the proportion of the epigenetic variance 
over the total genetic variance {Rfl as a function of the 
occurrence rate of methylation in a natural population. The 

total and epigenetic genetic variances are calculated by assuming 
population genetic parameters [p, q, u, D 1e , D 2e , D u ) = (0.4, 0.6, 
u, 0.05, 0.05, D 12 ) (allowing u and D 12 to change) and quantitative 
genetic parameters {a h a & d ]e , d 2e , d ]2 ) = (0.4, 0.05, 0.05, 0.05, 0.05). 



Quantitative genetic effect 

By fixing population genetic parameters, the influence of 
genetic effects triggered by the epiallele was investigated. 
A small value of the additive effect a e formed by the 
epiallele brings about considerable epigenetic variance 
(Figure 3). This influence increases with increasing a e 
values. The epigenetic variance is also remarkably 
affected by the dominant effect between the original 
alleles and epiallele (Figure 4). It is clear that these effect 
parameters contribute to the epigenetic variance also 
through their complex interactions. 

Computer simulation 

Our model allows the estimation and test of epigenetic 
effects. We carried out simulation studies to examine 
the statistical properties of the model. A study popula- 
tion was simulated by assuming a set of population and 
quantitative genetic parameters and a normally distribu- 
ted residual error with mean zero and variance scaled 
under a range of trait heritabilities. As expected, the esti- 
mation precision increases with increasing sample size 
and heritability. A sample size 400 is sufficient to pro- 
vide reasonable estimates of all population genetic para- 
meters (Table 1). Note that the estimation precision of 
the population parameters does not rely on the size 
of heritability. In general, the reasonable estimation 




-0.05 



-o.i a 



'0.10 



Figure 2 Change of the proportion of the epigenetic 
variance over the total genetic variance {Re) as a function 
of Hardy-Weinberg disequilibrium (HED) coefficients 
formed between the original allele and epiallele in a natural 
population after DNA methylation. The total and epigenetic 
genetic variances are calculated by assuming population genetic 
parameters (p, q, u, D 1e , D 2e , D 12 ) = (0.4, 0.6, u, D 1e , D 2e , 0) (allowing 
u, D 1e , and D 2e to change) and quantitative genetic parameters 
(o h a & di ei d 2e , d u ) = (0.4, 0.05, 0.05, 0.05, 0.05). 




Figure 3 Change of the proportion of the epigenetic 
variance over the total genetic variance {Re) as a function 
of the additive genetic effect due to the substitution of 
the original allele by the epiallele. The total and epigenetic 
genetic variances are calculated by assuming population genetic 
parameters (p, q, u, D 1e , D 2e , D u ) = (0.4, 0.6, 0.2, 0, 0, 0) and 
quantitative genetic parameters (a h o e , d 1e , d 2e , d u ) = {o h a e , 0.05, 
0.05, 0.05) (allowing a y and a e to change). 
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Figure 4 Change of the proportion of the epigenetic 
variance over the total genetic variance [R 2 ) as a function 
of the dominant genetic effect due to the interaction between 
the original allele and epiallele. The total and epigenetic genetic 
variances are calculated by assuming population genetic parameters 
(p, q, u, D ]e , D 2e , D 12 ) = (0.4, 0.6, 0.2, 0.01, 0.01, 0) and quantitative 
genetic parameters (a h a & d ]ei d 2ei d u ) = (0.08, 0.12, d ]ei d 2e , d u ) 
(allowing d ]ei d 2e and d ]2 to change). 

V J 



of quantitative genetic parameters, especially domin- 
ant genetic effects, needs a much larger sample size, 
say 1000 (Table 1). As expected, the estimation preci- 
sion of genetic effects is sensitive to heritability. In 
practice, every effort should be given to precisely 
measure the phenotypic trait, aimed to increase the 
level of heritability. 

We also investigated the power of detecting epiallelic 
HWD occurrence and epigenetic effects as well as the 
false positive rates for epigenetic effect identification 
under different heritabilities and sample sizes (Table 2). 
Given a medium sample size 400, the model possesses 
adequate power (> 0.95) for the detection of small epial- 
leli HWD coefficients, along with small false positive 
rates (< 0.10). The power of the model to detect epigen- 
etic effects was calculated by testing the hypothesis, H 0 : 
a e = di e = d 2e = 0 vs. Hx: at least one of the effects in 
the H 0 is not equal to zero, and comparing the resulting 
log-likelihood ratio test statistic with the critical thresh- 
old of a chi-square distribution with three degrees of 
freedom. The proportion of the number of simulation 
replicates that reject the null hypothesis over the total 
number of simulation replicates is empirically used as 
the power of the model. The power of epigenetic effect 
detection is very sensitive to the magnitude of the epi- 
genetic effect, heritability and sample size (Table 2). 
When the epigenetic effect is small, the model has low 



Table 1 MLEs of population and quantitative genetic 
parameters from simulated data with different 
heritabilities (H 2 ) and sample sizes (n) 





True 


H 2 = 0.05 


H 2 = 0.1 


H 2 = 0.2 


MLE SD 


MLE SD 


MLE SD 


n=400 


0.1 


0.099 (0.019) 


0.100 (0.017) 


0.099 (0.020) 


u 
P 


0.4 


0.399 (0.020) 


0.400 (0.022) 


0.403 (0.018) 


Dn 


0.01 


0.011 (0.010) 


0.008 (0.011) 


0.009 (0.012) 


D 1e 


0.01 


0.010 (0.003) 


0.010 (0.003) 


0.010 (0.003) 


D 2e 


0.01 


0.010 (0.004) 


0.010 (0.005) 


0.010 (0.004) 




1 


1.002 (0.096) 


1 .009 (0.066) 


1 .002 (0.043) 


0i 


0.2 


0.201 (0.113) 


0.193 (0.085) 


0.198 (0.049) 


a e 


0.05 


0.060 (0.181) 


0.064 (0.134) 


0.054 (0.080) 


dn 


0.05 


0.050 (0.076) 


0.049 (0.055) 


0.049 (0.032) 


d, e 


0.05 


-0.015 (0.485) 


-0.008 (0.401) 


0.010 (0.267) 


die 


0.05 


0.027 (0.279) 


0.047 (0.171) 


0.042 (0.116) 


n=1000 


0.1 


0.101 (0.014) 


0.101 (0.013) 


0.102 (0.013) 


u 
P 


0.4 


0.401 (0.012) 


0.400 (0.012) 


0.401 (0.011) 


D u 


0.01 


0.010 (0.007) 


0.009 (0.007) 


0.010 (0.007) 


D le 


0.01 


0.010 (0.003) 


0.010 (0.002) 


0.010 (0.002) 


D 2e 


0.01 


0.010 (0.003) 


0.010 (0.003) 


0.010 (0.003) 




1 


1.003 (0.053) 


0.996 (0.039) 


1 .000 (0.027) 


01 


0.2 


0.202 (0.067) 


0.207 (0.049) 


0.200 (0.031) 


a e 


0.05 


0.051 (0.099) 


0.037 (0.068) 


0.050 (0.050) 


dn 


0.05 


0.051 (0.048) 


0.047 (0.035) 


0.051 (0.023) 


d ]e 


0.05 


0.056 (0.269) 


0.046 (0.191) 


0.038 (0.122) 


die 


0.05 


0.048 (0.158) 


0.057 (0.111) 


0.054 (0.081) 



The MLEs of parameters and their standard deviation (in parentheses) were 
calculated from 200 simulation replicates. 



power to detect it, although the power increases with 
increasing heritability and sample size. To detect a small 
epigenetic effect, a large sample size (2000 or more) is 
required for a precisely measured phenotype (with a 
large heritability). For a medium-size epigenetic effect, 
a sample size 1000 may be adequate for its detection 
if then phenotype is precisely measured. In general, 
the model has reasonably small false positive rates even 
for a medium sample size (Table 2). 

Implementing the epigenetic model into GWAS 

The epigenetic model proposed can be implemented to 
genome-wide association studies (GWAS). In GWAS, it 
is likely that we have a million of methylated sites 
detected throughout the entire genome on a much smal- 
ler number of samples. Moreover, samples collected 
for human GWAS are highly heterogeneous in terms 
of genetic background, gender, age, race, and many 
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Table 2 The power of epigenetic-effect detection by the 
epigenetic model and its false positive rates (FPR) under 
different sample sizes (n) and heritabilities (H 2 ) 





n 


oe = d = d 


H 2 = 0.05 


H 2 = 0.1 


H 2 = 0.2 


Power 


400 


0.05 


0.055 


0.090 


0.160 




1000 


0.05 


0.090 


0.125 


0.355 




2000 


0.05 


0.115 


0.205 


0.630 




5000 


0.05 


0.215 


0.415 


0.975 




1000 


0.1 


0.085 


0.255 


0.780 




2000 


0.1 


0.265 


0.525 


0.975 




5000 


0.1 


0.495 


0.950 


1.00 


FPR 


400 


0.05 


0.050 


0.045 


0.065 




1000 


0.05 


0.060 


0.025 


0.045 




2000 


0.05 


0.030 


0.010 


0.045 




5000 


0.05 


0.085 


0.050 


0.070 




1000 


0.1 


0.045 


0.040 


0.040 




2000 


0.1 


0.055 


0.025 


0.030 




5000 


0.1 


0.050 


0.020 


0.045 



other demographic characteristics. These demographic 
factors should be modeled as covariates. For a single 
methylated site, we can build a linear model to describe 
the phenotypic value of individual i by considering its 
multifactorial determinants, expressed as 

ji = ft + £ n ai + £ i2 a e + £ i3 d le + ^^d 2e 

R S L s 

+ f ,- 5 rfi2 + ^2 arUir + ^2 ^2 XisiVsi + et ( 19 ) 

r=l 5=1 1=1 

where £ a , . . ., £ 5 are the indicator variable for subject 
i that corresponds to a specific genetic or epigenetic 
effect at a methylated site, u ir (r = 1, . . R) is the value 
of the rth continuous covariate, such as age and BMI, 
for subject r, a r is the effect of the rth continuous covari- 
ate, v s i {I = 1, . . L & s = 1, . . S) is the effect of the /th 
level for the 5th discrete covariate, such as race, gender, 
and treatment, with J] f=\vsl = 0 where L s is the number 
of levels for the 5th discrete covariate, x is i is an indicator 
variable of subject i who receives the /th level of the 5th 
discrete covariate, and e t is a random error. 

A standard multiple linear regression approach can be 
used to estimate all the effects described in model (19). 
If the test is made individually for each of the meth- 
ylated sites, the significance of each effect should be 
adjusted by multiple comparison approaches such as 
Bonferroni or FDR. 

Analysis of one single methylated site at a time is 
limited for statistical inference about a comprehensive pic- 
ture of the genetic and epigenetic architecture of complex 
phenotypes. The best way such a picture is illustrated is to 
analyze all sites simultaneously. Li et al. [24] proposed a 



new approach by incorporating the least absolute shrinkage 
and selection operator (lasso) [25] to simultaneously 
analyze a larger number of variables using a much smaller 
sample size. A detailed algorithm for the Bayesian lasso 
has been derived [24] and can be readily implemented to 
GWAS aimed to identify epige- netic variants. 

Discussion 

Epigenetic alternations have been increasingly recognized 
to play an important role in generating and maintaining 
quantitative genetic variation for complex phenotypes 
underlying physiology and diseased [6,7,9,26-28]. Prelim- 
inary estimates in plants suggest that it can account for 
up to 30% of the variation in commonly studied pheno- 
types such as height and flowering time [8]. Many theoret- 
ical models have been available to analyze the contributions 
of epigenetic marks to missing heritability in genome-wide 
association studies (GWAS) [14-18]. In this article, we 
extended Mendelian inheritance-based genetic principles to 
derive a quantitative framework by which to analyze the 
pattern of how DNA methylation contributes to overall 
genetic variance. By defining several epigenetic effect para- 
meters, the analytical framework allows the mechanistic 
characterization of epigenetic actions within the quantita- 
tive genetic context. 

Through numerical analysis, a small incidence of DNA 
methylation as well as a small effect due to methylation 
alternations could lead to a substantial increase of gen- 
etic variance, suggesting that epigenetic marks may be 
an important cause for genetic diversity in nature. Given 
our finding, the neglection of epigenetic variants in 
many current GWAS may partly explain the problem of 
missing heritability [17]. Simulation studies suggest that 
the model can provide reasonable estimates of epigenetic 
effect parameters with a sample size of 200 - 400, even 
when the trait studied has a small heritability. It should 
be pointed out, however, that this conclusion is based 
on a well-controlled study in which there are few back- 
ground noises. For the GWAS in humans, the estimated 
genetic variation is likely to be confounded by many fac- 
tors, such as population structure, heterogeneous genetic 
background, demographic complexity, and highly noisy 
phenotypic measurements among others. To remove 
these confounding effects from genetic and epigenetic 
analysis, a considerably large sample size may be needed. 

The model only considers a single methylated site. 
However, there is no technical difficulty in extending the 
model to explore two or more sites at the same time 
which may interact with each other to produce a com- 
plex network of epistasis [29]. For two methylated sites, 
a total of 25 interaction parameters are formed between 
parameter sets each composed of (a lf a e , di e , d 2e > dyi) 
for each site. In this case, an exponentially increasing 
sample size and more precise phenotypic measurement 
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(aimed to increase the trait s heritability) are needed. For 
the methylated population, originally existing HWE as- 
sumption may be violated in which case it is not possible 
to use gametic linkage disequilibria to specify the associ- 
ation between the two sites. Wu et al. [30] proposed a 
robust approach to analyze the marker-marker associ- 
ation by deriving a so-called zygotic linkage disequilib- 
rium model. Wu et al.s approach can be incorporated to 
identify the contribution of epigenetic marks at two sites 
to the overall genetic variance. 

Epigenetic changes may be an adaptation to environ- 
mental perturbations [5,17,28]. Thus, it is crucial to 
incorporate the epigenetic model into a genotype- 
environment interaction study. By doing so, we can 
identify which and how epigenetic effects interact with 
the environment to determine final phenotypes so that 
the genetic etiology of quantitative variation can be 
better elucidated. In addition, there is a considerable 
body of evidence that epigenetic effects may transmitted 
from one generation to next [31,32], although other 
studies found the reprogramming of epigenetic effects 
during meiosis [5,33,34]. By embedding our epigenetic 
model into a family-based design, we can develop a 
powerful approach to test the relative importance of 
these two phenomena in trait control [35-37]. Trad- 
itional models analyze the inheritance of quantitative 
traits based on Mendel's laws, failing to study the contri- 
bution of epigenetic modifications. In addition, many 
GWAS are based on a case-control study in which 
genotype frequencies are compared between two groups. 
To study the association between epigenetic effects and 
a particular disease, such as cancer, we can incorporate 
quantitative epigenetic models as described by equations 
(10) - (14) into a case-control framework, allowing each 
effect to be tested. The integration of general quantita- 
tive genetic models and a case-control design has been 
discussed and its statistical properties investigated 
through analytical derivations and computer simulations 
[38-40]. With these extensions, the new model proposed 
in this article by integrating traditional quantitative gen- 
etic theory and the latest discoveries of epigenetic effects 
will allow geneticists to chart a more comprehensive 
picture of the genetic landscape for complex pheno- 
types underlying agricultural production, physiology and 
human diseases. 
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